Cloud-native global file system with constant-time rekeying

ABSTRACT

A cloud-native global file system in which a local filer creates objects and forward them to a cloud-based object store is augmented to include constant-time rekeying (CTR). At volume creation time on the filer, a random Intermediate Key (IK) is generated. The IK is encrypted using one or more public key(s) for the volume in question, and then stored in encrypted form in a volume metadata file (e.g., cloudvolume.xml) alongside the other volume information. Once created, the IK is treated like any other volume metadata. During startup of a volume manager on the filer, the one or more per-volume IK blobs (present) are decrypted using an appropriate secret key, and then cached in memory. All objects sent to the cloud are then symmetrically encrypted to the current IK for that volume. All objects read from the cloud are decrypted using the locally-cached IK.

BACKGROUND OF THE INVENTION Technical Field

This application relates generally to data storage.

Background of the Related Art

It is known to provide a cloud-native global file system that is used toprovide primary file storage for enterprise data. In this approach, edgeappliances (or “filers”) typically located on-premises securely transmitall files, file versions and metadata to a preferred private or publiccloud object store, while locally caching only active files. Theappliances are stateless, and multiple appliances can mount the samevolume in the cloud. As files are written locally, an authoritative copyof every file and metadata (inodes) are stored in the cloud. The systemprovides a single, unified namespace for all primary file data that isnot bound by local hardware or network performance constraints. Theabove-described approach to enterprise file services also has beenextended to provide multiple-site/multiple-filer access to the samenamespace, thereby enabling participating users with the ability tocollaborate on documents across multiple filers/sites. A system of thistype is available commercially from Nasuni® Corporation of Boston, Mass.

In a system such as described above, preferably all data written to thecloud is encrypted. In one approach, an object (a blob of data) iswrapped in a data packet, which is then compressed inside a compresseddata packet. That compressed data packet is then encrypted to a random256-bit AES-256 key called a session key (SK). In turn, the AES-256session key is then encrypted using a public key for the volume thatowns the blob. This public key encrypted session key (PKESK) is thenprepended to the encrypted blob. In this manner, each object in thecloud contains the key for its own decryption, but only if the secretkey is available to decrypt the session key. This significantlysimplifies the need to store per-object keys. When the object needs tobe decrypted, the PKESK is examined and a determination made whetherthere exists a secret key that matches the key ID of the public keyoriginally used to encrypt the PKESK. If so, that secret key is used todecrypt the PKESK, revealing the AES-256 session key. That session keyis then used to decrypt the encrypted blob, revealing a compressed datapacket, which in turn is decompressed into a literal packet, whichfinally is then unwrapped into the original blob of data.

In OpenPGP terms, each object in the cloud is a full RFC-4880 message,with the innermost piece being the data blob, which is inside theliteral data packet, which is inside the compressed data packet, whichis inside the encrypted data packet. In this approach, every encrypteddata packet is encrypted to a random AES-256 key, which effectivelymeans that every object in the cloud is symmetrically encrypted to arandom key.

While the above-described approach is highly-secure, rekeying canpresent challenges. Rekeying refers to the situation when a servicecustomer decides to change the key used on a volume. There can beseveral reasons for rekeying, including key compromise (the key isstolen, or more simply someone who has access to the key leaves thecompany). Some customers, like banks, may also have regulatoryrequirements around the need to rekey. To effect rekeying, the PK/SKpair used to encrypt and decrypt the PKESKs is changed, but typicallyrekeying only applies to new objects in the cloud. Existing objects atrest are not affected and continue to be encrypted to the old key. Asobjects are deleted, modified, and added, the new key is used more andmore, but even if the customer has significant churn of their entiredata set (unlikely), and has pruning enabled to remove old data, therewill always be a percentage of data that remains encrypted to the oldkey. For the archive use case, the old key is likely to remain on amajority of data.

It would be desirable to provide a technique to rekey all objects in thecloud but without the need to manipulate all of these objects.

BRIEF SUMMARY

According to this disclosure, constant-time rekeying (CTR) is enabled byimplementing a new key management technique. In one embodiment, and atvolume creation time, a random Intermediate Key (IK) is generated. TheIK is encrypted using one or more public key(s) for the volume inquestion, and then stored in encrypted form in a volume metadata file(e.g., cloudvolume.xml) alongside the other volume information. Oncecreated, the encrypted IK is treated like any other volume metadata. Byencrypting the IK to the specific volume key, the IK for a given volumecannot be revealed unless the customer private key pair (PK/SK) for thatparticular volume also is present. During startup of a volume manager onthe filer, the one or more per-volume IK blobs (present) are decryptedusing an appropriate secret key, and then cached in memory. The memorychunk containing the cached key is marked as unswappable (e.g., viamlock or similar) to help prevent it from leaking. All objects sent tothe cloud are then symmetrically encrypted to the cached IK for thatvolume. All objects read from the cloud are symmetrically decryptedusing the locally-cached IK for that volume. To enable filers that sharethe volume (using the PK/SK) to access the key, the encrypted IK isadded to replication metadata (e.g. replication.xml) so it can bedistributed to the remote filers mounting the volume in question. Once aremote filer receives the IK (e.g., via replication.xml), it can beinserted into that filer's metadata file (cloudvolume.xml). Whenreplicated, the IK is still encrypted, so it will need to be decryptedusing the shared PK/SK.

If and when a customer decides it wants to rekey, it uploads a new PK/SKkey pair. Internally, a new IK is generated and encrypted to the newPK/SK. The encrypted IK is given to the volume manager, which then addsit to the metadata file (cloudvolume.xml) and caches it in memory foruse encrypting and decrypting the volume just as on new volume creation.The same encrypted IK is made available (e.g., in replication.xml) forother filers to use. The other filers will request the PK/SK for thisencrypted IK using a key sharing mechanism. Theoretically, any filer canrun the rekey process; preferably, however, and because typically thereis a master filer for a given volume, the rekeying preferably is carriedout on the master filer.

According to a further feature, an existing volume may also beselectively converted to constant-time rekeying.

The foregoing has outlined some of the more pertinent features of thesubject matter. These features should be construed to be merelyillustrative. Many other beneficial results can be attained by applyingthe disclosed subject matter in a different manner or by modifying thesubject matter as will be described.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the present invention and theadvantages thereof, reference is now made to the following descriptionstaken in conjunction with the accompanying drawings, in which:

FIG. 1 is a block diagram illustrating how a known versioned file systeminterfaces a local file system to an object-based data store;

FIG. 2 is a block diagram of a representative implementation of aportion of the interface shown in FIG. 1;

FIG. 3 is a more detailed implementation of the interface where thereare a number of local file systems of different types;

FIG. 4 illustrates the interface implemented as an appliance within alocal processing environment;

FIG. 5 describes further details of a versioned file system in which thetechniques of this disclosure may be implemented;

FIG. 6 depicts a process flow of a constant time rekeying (CTR)technique of this disclosure.

DETAILED DESCRIPTION

FIG. 1 illustrates a local file system 100 and an object-based datastore 102. Although not meant to be limiting, preferably theobject-based data store 102 is a “write-once” store and may comprise a“cloud” of one or more storage service providers. An interface 104 (or“filer”) provides for a “versioned file system” that only requireswrite-once behavior from the object-based data store 102 to preservesubstantially its “complete” state at any point-in-time. As used herein,the phrase “point-in-time” should be broadly construed, and it typicallyrefers to periodic “snapshots” of the local file system (e.g., onceevery “n” minutes). The value of “n” and the time unit may be varied asdesired. The interface 104 provides for a file system that has completedata integrity to the cloud without requiring global locks. Inparticular, this solution circumvents the problem of a lack of reliableatomic object replacement in cloud-based object repositories. Theinterface 104 is not limited for use with a particular type of back-enddata store. When the interface is positioned in “front” of a data store,the interface has the effect of turning whatever is behind it into a“versioned file system” (“VFS”). The VFS is a construct that is distinctfrom the interface itself, and the VFS continues to exist irrespectiveof the state or status of the interface (from which it may have beengenerated). Moreover, the VFS is self-describing, and it can be accessedand managed separately from the back-end data store, or as a componentof that data store. Thus, the VFS (comprising a set of structured datarepresentations) is location-independent. In one embodiment, the VFSresides within a single storage service provider (SSP) although, asnoted above, this is not a limitation. In another embodiment, a firstportion of the VFS resides in a first SSP, while a second portionresides in a second SSP. Generalizing, any given VFS portion may residein any given data store (regardless of type), and multiple VFS portionsmay reside across multiple data store(s). The VFS may reside in an“internal” storage cloud (i.e. a storage system internal to anenterprise), an external storage cloud, or some combination thereof.

The interface 104 may be implemented as a machine. A representativeimplementation is the Nasuni® Filer, available from Nasuni® Corporationof Boston, Mass. Thus, for example, typically the interface 104 is arack-mounted server appliance comprising hardware and software. Thehardware typically includes one or more processors that execute softwarein the form of program instructions that are otherwise stored incomputer memory to comprise a “special purpose” machine for carrying outthe functionality described herein. Alternatively, the interface isimplemented as a virtual machine or appliance (e.g., via VMware®, or thelike), as software executing in a server, or as software executing onthe native hardware resources of the local file system. The interface104 serves to transform the data representing the local file system (aphysical construct) into another form, namely, a versioned file systemcomprising a series of structured data representations that are usefulto reconstruct the local file system to any point-in-time. Arepresentative VFS is the Nasuni Unity File System (UniFS™). Althoughnot meant to be limiting, preferably each structured data representationis an XML document (or document fragment). As is well-known, extensiblemarkup language (XML) facilitates the exchange of information in a treestructure. An XML document typically contains a single root element (ora root element that points to one or more other root elements). Eachelement has a name, a set of attributes, and a value consisting ofcharacter data, and a set of child elements. The interpretation of theinformation conveyed in an element is derived by evaluating its name,attributes, value and position in the document.

The interface 104 generates and exports to the write-once data store aseries of structured data representations (e.g., XML documents) thattogether comprise the versioned file system. The data representationsare stored in the data store. Preferably, the XML representations areencrypted before export to the data store. The transport may beperformed using known techniques. In particular, REST (RepresentationalState Transfer) is a lightweight XML-based protocol commonly used forexchanging structured data and type information on the Web. Another suchprotocol is Simple Object Access Protocol (SOAP). Using REST, SOAP, orsome combination thereof, XML-based messages are exchanged over acomputer network, normally using HTTP (Hypertext Transfer Protocol) orthe like. Transport layer security mechanisms, such as HTTP over TLS(Transport Layer Security), may be used to secure messages between twoadjacent nodes. An XML document and/or a given element or object thereinis addressable via a Uniform Resource Identifier (URI). Familiarity withthese technologies and standards is presumed.

FIG. 2 is a block diagram of a representative implementation of how theinterface captures all (or given) read/write events from a local filesystem 200. In this example implementation, the interface comprises afile system agent 202 that is positioned within a data path between alocal file system 200 and its local storage 206. The file system agent202 has the capability of “seeing” all (or some configurable set of)read/write events output from the local file system. The interface alsocomprises a content control service (CCS) 204 as will be described inmore detail below. The content control service is used to control thebehavior of the file system agent. The object-based data store isrepresented by the arrows directed to “storage” which, as noted above,typically comprises any back-end data store including, withoutlimitation, one or more storage service providers. The local file systemstores local user files (the data) in their native form in cache 208.Reference numeral 210 represents that portion of the cache that storespieces of metadata (the structured data representations, as will bedescribed) that are exported to the back-end data store (e.g., thecloud).

FIG. 3 is a block diagram illustrating how the interface may be usedwith different types of local file system architectures. In particular,FIG. 3 shows the CCS (in this drawing a Web-based portal) controllingthree (3) FSA instances. Once again, these examples are merelyrepresentative and they should not be taken to limit the invention. Inthis example, the file system agent 306 is used with three (3) differentlocal file systems: NTFS 300 executing on a Windows operating systemplatform 308, MacFS (also referred to as “HFS+” (HFSPlus)) 302 executingon an OS X operating system platform 310, and EXT3 or XFS 304 executingon a Linux operating system platform 312. These local file systems maybe exported (e.g., via CIFS, AFP, NFS or the like) to create a NASsystem based on VFS. Conventional hardware, or a virtual machineapproach, may be used in these implementations, although this is not alimitation. As indicated in FIG. 3, each platform may be controlled froma single CCS instance 314, and one or more external storage serviceproviders may be used as an external object repository 316. As notedabove, there is no requirement that multiple SSPs be used, or that thedata store be provided using an SSP.

FIG. 4 illustrates the interface implemented as an appliance within alocal processing environment. In this embodiment, the local file systemtraffic 400 is received over Ethernet and represented by the arrowidentified as “NAS traffic.” That traffic is provided to smbd layer 402,which is a SAMBA file server daemon that provides CIFS (Windows-based)file sharing services to clients. The layer 402 is managed by theoperating system kernel 404 is the usual manner. In this embodiment, thelocal file system is represented (in this example) by the FUSE kernelmodule 406 (which is part of the Linux kernel distribution). Components400, 402 and 404 are not required to be part of the appliance. The filetransfer agent 408 of the interface is associated with the FUSE module406 as shown to intercept the read/write events as described above. TheCCS (as described above) is implemented by a pair of modules (which maybe a single module), namely, a cache manager 410, and a volume manager412. Although not shown in detail, preferably there is one file transferagent instance 408 for each volume of the local file system. The cachemanager 410 is responsible for management of “chunks” with respect to alocal disk cache 414. This enables the interface described herein tomaintain a local cache of the data structures (the structured datarepresentations) that comprise the versioned file system. The volumemanager 412 maps the root of the FSA data to the cloud (as will bedescribed below), and it further understands the one or more policies ofthe cloud storage service providers. The volume manager also providesthe application programming interface (API) to these one or moreproviders and communicates the structured data representations (thatcomprise the versioned file system) through a transport mechanism 416such as cURL. cURL is a library and command line tool for transferringfiles with URL syntax that supports various protocols such as FTP, FTPS,HTTP, HTTPS, SCP, SFTP, TFTP, TELNET, DICT, LDAP, LDAPS and FILE. cURLalso supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTPform based upload, proxies, cookies, user+password authentication, filetransfer resume, proxy tunneling, and the like. The structured datarepresentations preferably are encrypted and compressed prior totransport by the transformation module 418. The module 418 may provideone or more other data transformation services, such as duplicateelimination. The encryption, compression, duplicate elimination and thelike, or any one of such functions, are optional. A messaging layer 420(e.g., local socket-based IPC) may be used to pass messages between thefile system agent instances, the cache manager and the volume manager.Any other type of message transport may be used as well.

The interface shown in FIG. 4 may be implemented as a standalone system,or as a managed service. In the latter case, the system executes in anend user (local file system) environment. A managed service providerprovides the system (and the versioned file system service), preferablyon a fee or subscription basis, and the data store (the cloud) typicallyis provided by one or more third party service providers. The versionedfile system may have its own associated object-based data store, butthis is not a requirement, as its main operation is to generate andmanage the structured data representations that comprise the versionedfile system. The cloud preferably is used just to store the structureddata representations, preferably in a write-once manner, although the“versioned file system” as described herein may be used with anyback-end data store.

As described above, the file system agent 408 is capable of completelyrecovering from the cloud (or other store) the state of the native filesystem and providing immediate file system access (once FSA metadata isrecovered). The FSA can also recover to any point-in-time for the wholefile system, a directory and all its contents, a single file, or a pieceof a file. These and other advantages are provided by the “versionedfile system” of this disclosure, as it now described in more detailbelow.

For more details concerning the filer as described above, the disclosureof U.S. Pat. No. 9,575,841 is hereby incorporated by reference.

FIG. 5 is a block diagram that illustrates a system 500 for managing aversioned file system (as described above) that also includes thecapability of global locking. The system 500 includes an interface 510in communication with local traffic 520, a web-based portal 530, a localcache 540, a lock server 550, and cloud storage 560. The interface 510includes a SMBD layer 502, a NFSD layer 504, a FUSE module 506, a FSA508, a cache manager 512, a volume manager 514, a lock daemon 516, atransport layer 518, and an administrative module 522. In someembodiments, the interface 510 is the same as the interface describedwith respect to FIG. 4 but with the addition of the lock daemon 516.

SMB/CIFS lock requests are intercepted by SMBD layer 502, which is aSAMBA file server daemon. An optional Virtual File System (VFS) modulecan extend the SAMBA server daemon to send the local lock information tothe FSA 508. FSA 508 then communicates with FUSE 506 to coordinate theFUSE file descriptors (pointers) with the ioctl information to determinea path for the given file(s) associated with the lock request. Assuminga path is enabled for global locking, FSA 508 sends the lock and path tothe lock daemon 516, which handles the lock request as described below.If a path is not enabled for global locking, the lock request stayswithin the SAMBA server as it did previously (e.g., conflict management,etc. as described above) and it is not sent to the lock daemon 516.

NFS lock requests are passed through the NFSD layer 504 to FUSE 506.Assuming a path prefix is enabled for global locking, FSA 508communicates with the lock daemon 516 to handle the lock request using acommon protocol, as described above. If the path prefix is not enabledfor global locking, FSA 508 handles the lock request as it didpreviously (e.g., conflict management, etc. as described above) and thelock request is not sent to the lock daemon 516.

The lock daemon 516 is responsible for local lock management andcoordinating with the global lock server. The lock daemon 516 canperform one or more of the following functions: (a) translating the lockformat; (b) communicating with the centralized lock server; (c)acquiring locks; (d) lock peeking; (e) lock re-acquiring; (f) lockreleasing; and (g) communicating with the filer.

With respect to translating the lock format, the lock daemon 516 cantranslate the local file lock requests to a common lock formatunderstood by the centralized lock server 550 (described below). Usingthis approach, the lock server 550 receives a lock request in one formatregardless of the underlying network protocol (e.g., SMB/CIFS or NFS).The centralized lock server 550 can be in a network operations center(NOC) 555.

The lock daemon 516 can then communicate with the centralized lockserver 550 by making calls to a Centralized Lock API. Through the API,the lock daemon 516 can execute a lock request, an unlock request,and/or a lock break request. A lock request generally requires thetransmission of certain information such as the first handle (a uniqueidentifier to the original base object for the file), the requested lockmode, the file path, the protocol of the requester, etc. Additionalinformation such as timestamps and serial number can be included in thelock request. The requested lock mode is the type of access for thelock, such as a shared or exclusive lock, a lock for read, a lock forwrite, lock for exclusive write, lock for shared write. If thecentralized lock server 550 grants the lock request, the lock server 550then uses information provided in the lock request (e.g., the firsthandle) to retrieve the latest version of the requested file from cloudstorage 560. The centralized lock server 550 transmits the latestversion of the requested file to the lock daemon 516, which can storethe file in local cache 540.

An unlock request can include the same or similar information as thelock request but with an updated handle name that was generated as aresult of modifications to the locked file. A lock break request can beprovided by a system administrator to manually unlock a file (e.g., if auser leaves a locked file open overnight, a server goes down, etc.).

Prior to making a new lock request, the lock daemon 516 determineswhether a lock already exists in local cache 540 or on the centralizedlock server 550. If no lock exists in either of those locations, thelock daemon 516 acquires a new lock through the centralized lock server550. The new lock can have a lock mode computed using the requestedaccess and share profiles (masks).

Lock peeking can be initiated every time a file is opened for read. Inlock peeking, the lock daemon 516 can query whether a lock exists on thefile prior to opening the file. If a lock exists, the lock daemon 516can also determine the associated lock mode to evaluate whether the lockmode permits the user to open the file. The lock daemon 516 retrievesthis information from local lock cache 540 if the filer requesting thelock peek already has a write lock on the file. Otherwise, the lockdaemon 516 retrieves this information from the centralized lock server550. Each lock peek request can be cached in the local lock cache 540for a short time period (e.g., several seconds) to reduce traffic to thecentral lock server 550 if the lock daemon 516 receives a new lock peekrequest shortly after the first lock peek request.

For example, another user may have a lock for exclusive write access tothe file that does not allow any shared access (i.e., no shared readaccess). In this example, the lock daemon 516 determines from the lockquery that the file cannot be opened due to an existing lock on thefile. In another example, the lock mode can allow shared read or writeaccess in which case the lock daemon 516 determines from the lock querythat the file can be opened.

During lock peeking, the lock daemon 516 can also retrieve additionalinformation about the file, such as the file handle, handle version,first handle, and lock push version. The file handle is a pointer to thelatest version of the file in the cloud. The handle version is a versionof the file in the cloud. The first handle provides a unique identifierto the file across versions and renames of the file. The lock pushversion is the latest version of the file that was sent to the cloud.

The lock deamon 516 can cache locks and unlocks in a local lock cache540 for release to the centralized lock server 550. If a lock request ismade for a file that has a cached unlock request, the lock can bereestablished without having to acquire a new lock from the centralizedlock server 550. In such a situation, the unlock request is cancelled.This caching can reduce load on the lock server 550 and improve responsetime. In general, the unlock requests are cached for a certain period oftime prior to release to the lock server 550 to allow for such lockreestablishment.

As discussed above, the lock request includes information on theprotocol (e.g., SMB/CIFS or NFS) of the requester and the lock mode. Thelock server 550 receives this information and can determine, based onany existing lock(s) on the requested file, whether the lock server 550can issue multiple locks on the same file. The lock server 550 canevaluate the protocol used by the requester of the existing lock and theassociated access/share permissions of that lock and determine whetherprotocol used with the new lock requester is compatible.

In addition, the lock daemon 516 handles lock releases. In someembodiments, the lock daemon 516 does not immediately send the lockrelease to the lock server 550. This time delay can reduce load on thecentralized lock server 550 because files are frequently locked andunlocked in rapid succession, as discussed above. Before a lock isreleased, if the file was changed, the current data is sent to cloudstorage 560 (e.g., Amazon S3, Microsoft Azure, or other public orprivate clouds) so the most recent data is available to the next locker.

Finally, the lock daemon 516 can communicate with the FSA 508. The lockdaemon 516 can receive lock requests and/or lock peek requests from FSA508, which the lock daemon 516 translates into a common protocol fortransmission to the centralized lock server 550, as discussed above. Thelock daemon can also pass the updated handle name to the FSA 508 toperform a file-level snapshot before unlocking a file and/or a filelevel merge/synchronization before locking a file.

For global locking, it is desirable for the locker to have the mostrecent version of the file associated with the lock request (and lockgrant). To accomplish this, the cache manager 512 can be configured tosnapshot a single file (e.g., the file associated with the lock request)without triggering a copy-on-write (COW) event (which would cause aversion update, as discussed above) and without affecting other snapshotoperations. After a single file snapshot, the cache manager 512 can markall parent directories of the file as changed or “dirty.” In addition,the fault manager algorithm can be configured to fault a single filebased on requests from the FSA 508.

The merge/push algorithm can be modified to provide for merging singlefiles. Before the locked file is pushed to the local cache 540, the NOC555 assigns a unique lock version (e.g., 64 bit) to the file. The lockversion can be used by FSA 508 to determine whether a locked file or itsmetadata is dirty (i.e., changed). The parent directories of the lockedfile can continue to use the existing write version assigned from thelast TOC. Thus, FSA 508 can track two values: lock_write_version andlock_push_version. When a file or directory is dirtied, thelock_write_version is updated. When a file or directory is pushed tolocal cache 540, the lock_push_version is updated.

As discussed above, the file data from the NOC 555 (or centralized lockserver 550) is merged into the local cache 540 before the FSA 508returns control of the file to the client. To determine if the file datain the NOC 555 is newer than the file data in the cache 540 (e.g., ifthe lock is retrieved while an unlock request is cached), the FSA checksMAX (lock_write_version, lock_push_version) against the NOC lockversion. If the NOC lock version is greater than the lock_write_versionand the lock_push_version, the file data (object metadata and data) fromthe NOC 555 is used to instantiate the object (locked file) in the localcache 540. If the file data in the cache 540 is newer, then the filedata from the NOC 555 is discarded. In the circumstance where the NOC555 indicates that the file is deleted, the delete version is comparedto the local cache 540 version in order to apply the delete to the localcache 540.

In addition, the merge/push algorithm can be modified to reconcile thesingle-file merges of locked files with the snapshot merges of files.Any file that was “fastsynched” through the FSA 508 (i.e., locked) or“fastpushed” to the cloud (i.e., unlocked) is designated as “cloudfastsynced.” When merging an object or file that is considered “clouddirty” or “cloud fastsynched,” the FSA 508 will update the file if theincoming lock_push_version is greater than MAX (lock_write_version,lock_push_version), as discussed above. If the incominglock_push_version is less than MAX (lock_write_version,lock_push_version), the cache object is considered newer and theincoming update is discarded by the FSA 508. Also, when a file ismissing (deleted) from the pushed version but the file is also locallyfastsynched, the file will not be deleted. This merging can occurconcurrently or before the global lock on the file is granted.

In addition, if a file has been deleted or renamed, the local cachemetadata can record a “delete tombstone” which includes certaininformation (e.g., parent first handle, lock version, name, etc.). FSA508 merges a file as new if the file is newer than any delete tombstonecontained in the cache for the unique file. This can address thesituation in which a file has been fast synchronized before merge. Inthat case, the incoming cloud dirty file is old compared to the cacheand the import is discarded.

To ensure that the unlocked file includes the changes from the latestversion, the locked file can only be unlocked when the lock_push_versionis greater than or equal to the lock_write_version at which point theFSA 508 sends the lock_push_version back to the NOC 555 (or centralizedlock server 550) to store the new version of the file in cloud storage560.

In some embodiments, the interface 510 snapshots and merges new files atthe time of creation. The new file requests can be stored on the lockserver 550 with the lock entries. Other users can poll the lock server550 to determine if new files/objects exist that have not yet beenpopulated to the cloud 560, for example if there are new files/objectsin a given directory. After the new files have been created, the lockerserver 550 can merge the new file requests into the appropriatedirectories in the cloud 560.

The filers may be anywhere geographically, and no network connectivitybetween or among the filers is required (provided filers have aconnection to the service).

Sharing enables multi-site access to a single shared volume. The data inthe volume is 100% available, accessible, secure and immutable. Theapproach has infinite scalability and eliminates local capacityconstraints. The sites (nodes) may comprise a single enterpriseenvironment (such as geographically-distributed offices of a singleenterprise division or department), but this is not a requirement, asfilers are not required to comprise an integrated enterprise. Thisenables partners to share the filesystem (and thus particular volumestherein) in the cloud. Using the service provider-supplied interfaces,which are preferably web-based, the permitted users may set up a sharinggroup and manage it. Using the sharing approach as described, eachmember of the sharing group in effect “sees” the same volume. Thus, anypoint-in-time recovery of the shared volume is provided, and fullread/write access is enabled from each node in the sharing group.

Object Security and Rekeying

As has been described above, preferably all data written to the cloud isencrypted. In one approach, an object (a blob of data) is wrapped in adata packet, which is then compressed inside a compressed data packet.That compressed data packet is then encrypted to a random 256-bitAES-256 key called a session key (SK). In turn, the AES-256 session keyis then encrypted using a public key for the volume that owns the blob.This public key encrypted session key (PKESK) is then prepended to theencrypted blob. In this manner, each object in the cloud contains thekey for its own decryption, but only if the secret key is available todecrypt the session key. When the object needs to be decrypted, thePKESK is examined and a determination made whether there exists a secretkey that matches the key ID of the public key originally used to encryptthe PKESK. If so, that secret key is used to decrypt the PKESK,revealing the AES-256 session key. That session key is then used todecrypt the encrypted blob, revealing a compressed data packet, which inturn is decompressed into a literal packet, which finally is thenunwrapped into the original blob of data.

In OpenPGP terms, each object in the cloud is a full RFC-4880 message,with the innermost piece being the data blob, which is inside theliteral data packet, which is inside the compressed data packet, whichis inside the encrypted data packet. In this approach, every encrypteddata packet is encrypted to a random AES-256 key, which effectivelymeans that every object in the cloud is symmetrically encrypted to arandom key.

While the above-described approach is highly-secure, rekeying presentschallenges. Rekeying refers to the situation when a service customerdecides to change the key used on a volume. There can be several reasonsfor rekeying, including key compromise (the key is stolen, or moresimply someone who knows the key leaves the company). Some customers,like banks, may also have regulatory requirements around the need torekey. To effect rekeying, the PK/SK pair used to encrypt and decryptthe PKESKs is changed, but typically rekeying only applies to newobjects in the cloud. Existing objects at rest are not affected andcontinue to be encrypted to the old key. As objects are deleted,modified, and added, the new key is used more and more, but unless thecustomer has significant churn of their entire data set (unlikely), andhas pruning enabled, there will always be a percentage of data thatremains encrypted to the old key. For the archive use case, the old keyis likely to remain on a majority of data.

Constant-Time Rekeying

The following section provides details regarding one embodiment of animplementation of a constant-time rekeying (CTR) technique of thisdisclosure.

According to this disclosure, constant-time rekeying is enabled byimplementing a new key management technique that involves the use of anew key in addition to the keys described above. The approach assumesthat a customer has associated therewith one or more filers such asdescribed above. The filers may be co-located or geographicallydistributed, and the filers may share one or more volumes in the clouddata store, all in the manner previously described. The filers also mayimplement one or more global locks. The customer typically has anassociated public key cryptosystem key pair comprising a public key (PK)and an associated secret key (SK). If more than one filer isimplemented, typically one filer acts as a master filer, although thisis not a requirement. As previously depicted and explained, a filertypically has a volume manager. Metadata about a volume created andmanaged by the volume manager may be held within a volume metadata file,and the volume details may be replicated to one or more other filersthat share the volume through a replication mechanism. With the above asbackground, the following provides details of a CTR scheme of thisdisclosure.

In one embodiment, and with reference to FIG. 6, and at volume creationtime, a random Intermediate Key (IK) is generated. This is step 600. Atstep 602, the IK is encrypted using one or more public key(s) for thevolume in question, and then at step 604 stored in encrypted form in thevolume metadata file (e.g., cloudvolume.xml), typically alongside theother volume information. Once created, preferably the IK is treatedlike any other volume metadata. By encrypting the IK to the specificvolume key, the IK for a given volume cannot be revealed unless thecustomer private key pair (PK/SK) for that particular volume also ispresent. At step 608, and during startup of a volume manager on thefiler, the one or more per-volume IK blobs (present) are decrypted usingan appropriate secret key, and then cached in memory. At step 610, thememory chunk containing the cached key is marked as unswappable (e.g.,via mlock or similar). Object encryption then proceeds as objects arecreated and prepared for transmission to object storage in the cloud. Tothis end, and at step 612, objects sent to the cloud are thensymmetrically encrypted to the current IK for the volume. When an objectstored in the cloud is required to be returned to the filer, thelocally-cached IK is then used for decryption. To this end, and asdepicted at step 614, all objects read from the cloud are decryptedusing the locally-cached IK.

To enable filers that share the volume (using the PK/SK) to access thekey, preferably the encrypted IK is added to replication metadata (e.g.,replication.xml) so it can be distributed to the remote filers mountingthe volume in question. This is depicted as step 616 in FIG. 6. Once aremote filer receives the IK (e.g., via replication.xml), the IK isinserted into that filer's metadata file (cloudvolume.xml). Whenreplicated, the IK is still encrypted, and thus the replicated IKreceived at a filer is decrypted as necessary using the shared PK/SK.

The following summarizes the basic operation to provision and useconstant key rekeying. When a customer decides it wants to rekey, itprovides a new PK/SK key pair to the filer. Internally, a new IK isgenerated and encrypted to the new PK/SK. The encrypted IK is given tothe volume manager, which then adds it to the metadata file(cloudvolume.xml) and caches it in memory for use encrypting anddecrypting the volume just as on new volume creation. The previouslycached encrypted IK can be discarded. These operations are depicted atstep 618 in FIG. 6. As also described, the same encrypted IK is madeavailable (e.g., in replication.xml) for other filers (managed by thePK/SK key pair) to use. The other filers request the PK/SK for thisencrypted IK, preferably on an as-needed basis and using a key sharingmechanism. Theoretically, any filer can run the rekey process;preferably, however, and because typically there is a master filer for agiven volume, the rekeying preferably is carried out on the masterfiler. The approach provides for perfect forward secrecy (PFS), whereineven an exposed IK cannot be used against newer blobs.

In an alternative embodiment, the existing IK is re-encrypted using thenew PK/SK pair. In this approach, it is not required to keep track ofmultiple IKs, although perfect forward secrecy is not achieved.

Representative encryption algorithms for encrypting the IK typically usea selectable symmetric cipher (defaulting to AES256) and basic cryptoparameters (CFB, and the like). GnuPG is representative. The particularencryption technique (and/or parameters) utilized, however, are not alimitation of the disclosed technique.

According to a further feature, an existing volume (i.e., a volume whoseobjects have stored without encryption with the intermediate key) mayalso be selectively converted to constant-time rekeying, although thisis not a requirement. This is depicted at step 620 in FIG. 6. The stepsinvolved follow those described above (as if creating a new volume) and,in particular, wherein all chunks going forward are then CTK chunks.Existing chunks (those that are not yet rekeyed) still need to be readby the volume manager. By examining the first several bytes of a blob(sometimes referred to as the magic numbers, per Unix convention), thevolume manager read code determines if the blob needs to be passed tothe encryption function (to apply IK for the volume) or decryptedinternally.

Other methods of rekeying may be used. Although it is not constant-time,a rekey may be accomplished as a cloud-to-cloud migration, decryptingand re-encrypting the data as it is migrated. This alternative approachmay be a cloud-to-cloud migration or even a bucket to bucket “migration”within the same cloud. If this embodiment, preferably the re-cryptprocess is restricted as an OpenPGP unwrap and rewrap, i.e., the literalpacket and/or compressed packet are maintained, to thereby reduce CPUburden (to de- and re-compress).

Variants

Upon receiving a request (e.g., at a filer that has received anIK-encrypted blob from the global file store) to decrypt a blob, a keyindex or similar data structure may be used to help disambiguate whichIK is intended for use (as presumably the filer has numerous volumesassociated therewith).

The technique may be implemented using security hardware like a trustedplatform module (TPM). Such an approach provides an additional layer ofsecurity, as the TPM is a hardened device that cannot leak keys onceuploaded.

The technique herein provides significant advantages. The approach maybe implemented without user-visible operational changes in the way keysare handled (from the customer perspective). The technique can beimplemented without change into a network operation center interface forexchanging keys on shared volumes. The technique does not require changein how keys are escrowed. Using this approach, the system can bebootstrapped (e.g., after a DR event) with only the PK/SK pair. Afurther advantage is that approach does not require data (to beencrypted) to pass through a pipe, e.g., to a separate crypto engine;rather, in CTR there are no pipes, as encryption occurs in-thread forthe volume manager.

While the above describes a particular order of operations performed bycertain embodiments of the disclosed subject matter, it should beunderstood that such order is exemplary, as alternative embodiments mayperform the operations in a different order, combine certain operations,overlap certain operations, or the like. References in the specificationto a given embodiment indicate that the embodiment described may includea particular feature, structure, or characteristic, but every embodimentmay not necessarily include the particular feature, structure, orcharacteristic.

While the disclosed subject matter has been described in the context ofa method or process, the subject matter also relates to apparatus forperforming the operations herein. This apparatus may be speciallyconstructed for the required purposes, or it may comprise a computerselectively activated or reconfigured by a computer program stored inthe computer. Such a computer program may be stored in a computerreadable storage medium, such as, but is not limited to, any type ofdisk including an optical disk, a CD-ROM, and a magnetic-optical disk, aread-only memory (ROM), a random access memory (RAM), a magnetic oroptical card, or any type of media suitable for storing electronicinstructions, and each coupled to a computer system bus. Acomputer-readable medium having instructions stored thereon to performthe interface functions is tangible.

A given implementation of the disclosed subject matter is softwarewritten in a given programming language that runs on a server on anIntel-based hardware platform running an operating system such as Linux.As noted above, the interface may be implemented as well as a virtualmachine or appliance, or in any other tangible manner.

While given components of the system have been described separately, oneof ordinary skill will appreciate that some of the functions may becombined or shared in given instructions, program sequences, codeportions, and the like.

In the preferred approach as described, filers do not communicatedirectly with one another but, instead, communicate through ahub-and-spoke architecture. Thus, the notification mechanism typicallyleverages the intermediary (e.g., NMC) for passing the queries andresponses, as has been described. In an alternative embodiment, anddepending on the underlying architecture, some filer-to-filercommunication may be implemented.

Having described the subject matter, what is claimed is as follows.

1. A storage-as-a-service system to provide storage for an enterprise, comprising: at least one file system interface associated with the enterprise, wherein the file system interface is configured to represent, to the enterprise, a local file system whose data is stored in an object store associated with a cloud-based storage service provider; the file system interface associated with a volume manager, the volume manager configured to receive a public key pair for a volume, the private key pair comprising a private key and its associated public key, and in response: generate an intermediate key using the public key, add the intermediate key to a volume metadata file, and selectively share the volume metadata file including the intermediate key with one or more remote systems that share the volume managed by a global lock; and the file system interface using the intermediate key to encrypt one or more objects for storage in the object store.
 2. The system as described in claim 1 wherein is intermediate key is shared with at least one of the other remote systems via a replication mechanism.
 3. The system as described in claim 1 wherein a private key of the public key pair is used to decrypt and recover the intermediate key upon a given occurrence.
 4. The system as described in claim 3 wherein the given occurrence is receipt of a request by which an object encrypted by the intermediate key is returned from the object store to the file system interface.
 5. The system as described in claim 1 wherein the one or more per-volume intermediate key encrypted objects are decrypted using a private key of the public key pair and then cached in a memory associated with the file system interface.
 6. The system as described in claim 1 wherein the intermediate key is discarded upon receipt by the volume manager of an updated public key pair for the volume.
 7. The system as described in claim 1 wherein a new public key pair is associated to the volume at a given time or occurrence, and wherein the per-volume intermediate key is then updated based on the new public key pair.
 8. The system as described in claim 1 wherein the volume manager applies the operations retroactively to a volume whose objects have been stored without encryption with the intermediate key. 